The Communication Complexity of Distributed ε-Approximations
نویسندگان
چکیده
Data summarization is an effective approach to dealing with the “big data” problem. Whiledata summarization problems traditionally have been studied is the streaming model, the focus isstarting to shift to distributed models, as distributed/parallel computation seems to be the only viableway to handle today’s massive data sets. In this paper, we study ε-approximations, a classical datasummary that, intuitively speaking, preserves approximately the density of the underlying data setover a certain range space. We consider the problem of computing ε-approximations for a data setwhich is held jointly by k players, and give general communication upper and lower bounds thathold for any range space whose discrepancy is known. ∗Zengfeng Huang is supported by the Danish National Research Foundation grant DNRF84 through Center for MassiveData Algorithmics (MADALGO).†Ke Yi is supported by HKRGC under grants GRF-621413 and GRF-16211614.
منابع مشابه
Continuous Matrix Approximation on Distributed Data
Tracking and approximating data matrices in streaming fashion is a fundamental challenge. The problem requires more care and attention when data comes from multiple distributed sites, each receiving a stream of data. This paper considers the problem of “tracking approximations to a matrix” in the distributed streaming model. In this model, there are m distributed sites each observing a distinct...
متن کاملar X iv : 1 40 4 . 75 71 v 1 [ cs . D B ] 3 0 A pr 2 01 4 Continuous Matrix Approximation on Distributed Data
Tracking and approximating data matrices in streaming fashion is a fundamental challenge. The problem requires more care and attention when data comes from multiple distributed sites, each receiving a stream of data. This paper considers the problem of “tracking approximations to a matrix” in the distributed streaming model. In this model, there are m distributed sites each observing a distinct...
متن کاملWhat Is the Complexity of Stieltjes Integration?
We study the complexity of approximating the Stieltjes integral R 1 0 f (x)dg(x) for functions f having r continuous derivatives and functions g whose sth derivative has bounded variation. Let r(n) denote the nth minimal error attainable by approximations using at most n evaluations of f and g, and let comp(ε) denote the ε-complexity (the minimal cost of computing an ε-approximation). We show t...
متن کاملCommunication Efficient, Sample Optimal, Linear Time Locally Private Discrete Distribution Estimation
We consider discrete distribution estimation over k elements under ε-local differential privacy from n samples. The samples are distributed across users who send privatized versions of their sample to the server. All previously known sample optimal algorithms require linear (in k) communication complexity in the high privacy regime (ε < 1), and have a running time that grows as n · k, which can...
متن کاملApproximate Hamming Distance in a Stream
We consider the problem of computing a (1+ε)-approximation of the Hamming distance between a pattern of length n and successive substrings of a stream. We first look at the one-way randomised communication complexity of this problem. We show the following: If Alice and Bob both share the pattern and Alice has the first half of the stream and Bob the second half, then there is an O(ε−4 log2 n) b...
متن کامل